THE COMPLEXITY OF MULTIPLE-PRECISION 
ARITHMETIC 1 



Richard P Brent 



Computer Centre, Australian National University 



In studying the complexity of iterative processes it is usually assumed that the arith- 
metic operations of addition, multiplication, and division can be performed in certain 
constant times. This assumption is invalid if the precision required increases as the 
computation proceeds. We give upper and lower bounds on the number of single- 
precision operations required to perform various multiple-precision operations, and 
deduce some interesting consequences concerning the relative efficiencies of methods 
for solving nonlinear equations using variable-length multiple-precision arithmetic. 



1 Introduction 

Traub [28] defines analytic computational complexity to be the optimality theory of analytic or 
continuous processes. Apart from some work by Schultz [24] on differential equations, most re- 
cent results have concerned iterative methods for the solution of nonlinear equations or systems 
of equations. See, for example, Brent [1,3,6], Brent, Winograd and Wolfe [7], Kung [14,15], 
Kung and Traub [16-17], Paterson [21], Rissanen [22], Traub [28-32] and Wozniakowski [35,36]. 

The authors just cited make the (usually implicit) assumption that arithmetic is performed with 
a fixed precision throughout a given computation. This is probably true for most computations 
programmed in Fortran or Algol 60. Suppose, though, that we are concerned with an iterative 
process for approximating an irrational number £ (for example, \/2, it or e) to arbitrary ac- 
curacy. The iterative process should (theoretically) generate a sequence (xi) of real numbers, 
such that ( = nm x ii provided no rounding errors occur. On a computing machine each X{ 

i— >oo 

has to be approximated by a finite-precision machine-representable number x,, and C = hm Xi 

i—>oc 

can only hold if the precision increases indefinitely as i — > oo. In practice, only a finite number 
of members of the sequence (xj) will ever be generated, but if an accurate approximation to 
C is required it may be possible to save a large amount of computational work by using vari- 
able precision throughout the computation. This is likely to become easier to program as new 
languages (and possibly hardware), which allow the precision of floating-point numbers to be 
varied dynamically, are developed. 



First appeared in The Complexity of Computational Problem Solving (edited by R S Anderssen and R P 
Brent), Univ. of Queensland Press, 1976, 126-165. Retyped with minor corrections by Frances Page at Oxford 
University Computing Laboratory, 1999. 

Copyright © 1976-2010, R. P. Brent. rpb032 typeset using I^T^X. 



In Section 7 we discuss the effect of using variable precision when solving nonlinear equations. 
Before doing so, we consider the complexity of the basic multiple-precision arithmetic opera- 
tions. We assume that a standard floating-point number representation is used, with a binary 
fraction of n bits. (Similar results apply for any fixed base, for example, 10.) We are interested 
in the case where n is much greater than the wordlength of the machine, so the fraction occupies 
several words. For simplicity, we assume that the exponent field has a fixed length and that 
numbers remain in the allowable range, so problems of exponent overflow and underflow may 
be neglected. Note that our assumptions rule out exotic number representations (for example, 
logarithmic [4] or modular [33, 34] representations) in which it is possible to perform some (but 
probably not all) of the basic operations faster than with the standard representation. To rule 
out "table-lookup" methods, we assume that a random-access memory of bounded size and a 
bounded number of sequential tape units are available. (Formally, our results apply to multitape 
Turing machines.) 

In Sections 2 to 6 we ignore "constant" factors, that is factors which are bounded as n — > oo. 
Although the constant factors are of practical importance, they depend on the computer and 
implementation as well as on details of the analysis. Certain machine-independent constants are 
studied in Sections 7 and 8. 

If B is a multiple-precision operation, with operands and result represented as above (that is, 
"precision n" numbers), then t n (B) denotes the worst-case time required to perform B, ob- 
taining the result with a relative error at most 2™ n c, where c is independent of n. We assume 
that the computation is performed on a serial machine whose single-precision instructions have 
certain constant execution times. The following definition follows that in Hopcroft [11]. 

Definition 1.1 B is linearly reducible to C (written B =C), if there is a positive constant K 
such that 

t n {B) < Kt n {C) (1.1) 

for all sufficiently large n. B is linearly equivalent to C (written B = C) if B = C and C = B. 

In Section 2 we consider the complexity of multiple-precision addition and some linearly equiv- 
alent operations. Then, in Section 3, we show that multiple-precision division, computation of 
squares or square roots, and a few other operations are linearly equivalent to multiplication. 
Most of these results are well known [8, 9]. 

Sections 4 and 5 are concerned with the "operations" of evaluating exponentials, logarithms, 
and the standard trigonometric and hyperbolic functions (sin, artan, cosh, and so on). It turns 
out that most of (and probably all) these operations are linearly equivalent so long as certain 
restrictions are imposed. 

Section 6 deals with the relationship between the four equivalence classes established in Sections 
2 to 5, and several upper bounds on the complexity of operations in these classes are given. 
The best known constants relating operations which are linearly equivalent to multiplication are 
given in Section 7. 

Finally, in Section 8, we compare the efficiencies of various methods for solving nonlinear equa- 
tions using variable-length multiple-precision arithmetic. The relative efficiencies are different 
from those for the corresponding fixed-precision methods, and some of the conclusions may be 
rather surprising. The results of Sections 4 to 8 are mainly new. 

In the analysis below, ci,C2,... denote certain positive constants which do not need to be 
specified further. The notation / ~ g means that lim f(n)/g(n) = 1, and / = O(g) means that 
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|/(n)| < Kg(n) for some constant K and all sufficiently large n. Finally, the abbreviation ll mp" 
stands for "variable-length multiple-precision" . 

2 Addition and linearly equivalent operations 

Let A denote the operation of multiple-precision addition. Any reasonable implementation of 
floating-point addition, using at least one guard digit to avoid the possible occurrence of large 
relative errors, gives 

t n (A) < cm . (2.1) 

Conversely, from the assumptions stated in Section 1, it is clear that 

t n {A) > c 2 n . (2.2) 

Hence, the complexity of multiple-precision addition is easily established. (For the operations 
discussed in Sections 3 and 5 the results are less trivial, in fact the conjectured lower bounds 
corresponding to (2.2) have not been proved rigorously.) 

It is easy to see that bounds like (2.1) and (2.2) hold for multiple-precision subtraction, and 
multiplication or division of a multiple-precision number by a single-precision number (or even 
by any rational number with bounded numerator and denominator). Hence, all these operations 
are linearly equivalent to addition. 

3 Multiplication and linearly equivalent operations 

Let D, I, M, R and S denote the multiple-precision operations of division, taking reciprocals, 
multiplication, extraction of square roots and forming squares, respectively. In this section, we 
show that all these operations are linearly equivalent. The proofs are straightforward, but the 
result is surprising, as it seems intuitively obvious that taking a square root is inherently "more 
difficult" than forming a square, and similarly for division versus multiplication. (Some bounds 
on the relative difficulty of these operations are given in Section 7.) 

Lemma 3.1 

M^S^A. (3.1) 

Proof. Clearly 

t n (M) > t n (S) > c 3 n , (3.2) 

so the result follows from (2.1). 

Sharp upper bounds on t n {M) are not needed in this section, so we defer them until Section 6. 
Lemmas 3.2 and 3.3, although weak, are sufficient for our present purposes. 

Lemma 3.2 For all positive n, 

*2n(M) < c 4 t n (M) . (3.3) 

Proof. First assume that n is divisible by 3, and consider operations on the n-bit fractions 
only. If we can multiply n-bit numbers with relative error 2~ n co then we can multiply n/3-bit 
numbers exactly (assuming 2 n / 3 > 2cq). Thus, a 2n-bit fraction x may be split up into 

x = Xa + X 2 b + . . . + A 6 / , (3.4) 

where A = 2 _ri//3 and a,b,...,f are integers in [0, 2 n / 3 ), and the product of two such 2n-bit 
fractions may be formed exactly with 36 exact multiplications of n/3-bit numbers and some 
additions. Thus 
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t 2n (M) < m n {M) + c 5 t 2n (A) , (3.5) 

and the result follows from Lemma 3.1. Trivial modifications to the above proof suffice, if n is 
not divisible by 3. 

Lemma 3.3 For some constant cq < 1, 

t n (M) < c 6 t 8n (M) (3.6) 

for all sufficiently large n. 

Proof. If a, b, c and d are integers in [0, 2 n ), the identity 

(a + Xb) (c + Ad) = ac+ X(bc + ad) + X 2 bd , (3.7) 

with A = 2 3n , may be used to obtain the products ac and bd from one 8n-bit product. Thus 

2t n (M) <t 8n {M) + c 7 . (3.8) 

The result (with cq = 3/4) follows if n is sufficiently large that t 8n {M) > 2ci. (We have assumed 
that the time required for one n-bit multiplication is half the time required for two independent 
n-bit multiplications, but much weaker assumptions would be sufficient.) 

The following lemma will be used to estimate the work required for multiple-precision divisions 
and square roots. 

Lemma 3.4 Given a G (0,1), there is a constant c% such that, for any integers no,...,n p 
satisfying 

\<nj< a j n (3.9) 

for j = 0, 1, . . . ,p, we have 

v 

J^tnjiM) < c 8 t n (M) . (3.10) 

3=0 

Proof. Let A; be large enough that 

a k <l/8. (3.11) 

From (3.9) and (3.11), 

t njk (M)<4t n (M) (3.12) 

for j = 0, 1, ... , [p/k\ , provided njk is sufficiently large for Lemma 3.3 to be applicable. Thus, 

p 

XA( M ) ^ kt n( M ) (! + c 6 + 4 + ...)+ c 7 , (3.13) 

3=0 

where the term c 7 allows for those t nj (M) for which Lemma 3.3 is not applicable. If 

c 8 = k/{l-c 6 ) + c 7 , (3.14) 

the result follows from (3.13). 

The following lemma shows that multiple-precision multiplication is linearly equivalent to squar- 
ing. This result is essentially due to Floyd [9]. 
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Lemma 3.5 

M = S . (3.15) 

Proof. Since squaring is a special case of multiplication, 

M^S. (3.16) 

Conversely, we may use the identity 

4Xab = (a + Xb) 2 - {a - Xb) 2 , (3.17) 
where A is a power of 2 chosen so that 

\ < \Xb/a\ < 2 (3.18) 

(unless a = or b = 0). This scaling is necessary to avoid excessive cancellation in (3.17). 
(A detailed discussion of a similar situation is given in Brent [5].) From (3.17), 

t n (M) < 2t n (S) + 3t n {A) + c 9 , (3.19) 

so M = S follows from Lemma 3.1. 

The next two lemmas show that multiple-precision multiplication is linearly equivalent to taking 
reciprocals and to division. The idea of the proof of Lemma 3.6 is to use a Newton iteration 
involving only multiplications and additions to approximate 1/a. Computational work is saved 
by starting with low precision and approximately doubling the precision at each iteration. The 
basic idea is well-known and has even been implemented in hardware. 

The possibility of saving work by increasing the precision at each iteration is examined more 
closely in Sections 7 and 8. 



Lemma 3.6 



Proof. Consider the iteration 



I^D^M. (3.20) 



x j+1 = Xj (2 - axj) (3.21) 
obtained by applying Newton's method to the equation x~ l — a = 0. If 

x j = (l-e j )a- 1 , (3.22) 

then substitution in (3.21) shows that 

e j+ i = e) , (3.23) 

so the order of convergence is two. A single-precision computation is sufficient to give an initial 
approximation such that |eo| < \, and it follows from (3.23) that 

M < 2~ 23 (3.24) 

for all j > 0. 

In deriving (3.24) we have assumed that (3.21) is satisfied exactly, but a result like (3.24) holds 
so long as the right hand side of (3.21) is evaluated using a precision of at least 2 J+1 bits. Thus, 
an n-bit approximation to a" 1 can be obtained by performing |~log 2 n] iterations of (3.22) with 
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precision at least 2,2 2 ,2 3 , . . . , 2^2 n a t each iteration. From Lemma 3.4 (with a = \), 
this gives 

t n (I) < c w t n (M) . (3.25) 

Since b/a = b(l/a), it follows that 

tn(D) < c u t n (M) , (3.26) 
so D = M. Since I = D is trivial, the proof is complete. 

From ab = a/(l/b) it is clear that M = D. The proof that M = I is not quite so obvious, and 
uses the equivalences of multiplication and squaring (Lemma 3.5). 

Lemma 3.7 

M^I. (3.27) 

Proof. We may apply the identity 

a 2 (l - \a)- 1 = \- 2 [(1 - \ a y l - (1 + Ao)] (3.28) 

to obtain an approximation to a 2 , using only the operation of taking reciprocals, addition (or 
subtraction) and multiplication by powers of two. If a ^ 0, choose A to be a power of two such 
that 

2 -n/3-l < | Aa | < 2 l-»/3 j ( 3 . 2 9) 

and evaluate the right hand side of (3.28), using precision n. This gives an approximation to a 2 
with precision [re/3] , so 

S\n/$\ ~ In , (3.30) 

where the subscripts denote the precision. Thus, the result follows from Lemmas 3.2 and 3.5. 

To conclude this section we consider the complexity of multiple-precision square roots. Results 
like Lemmas 3.8 and 3.9 actually hold if is replaced by x p for any fixed rational p / or 1 
(we have already shown this for p = —1). 

Lemma 3.8 

M^R. (3.31) 
Proof. The proof is similar to that of Lemma 3.7, using the approximation 



2A 



-2 



1 + Act - (1 + 2Aa)2 



to a 2 . 



Lemma 3.9 

R^M . (3.32) 

Proof. The proof is similar to that of Lemma 3.6, using Newton's iteration 

Xj + i = \ (xj + a/xj) , (3.33) 

with precision increasing at each iteration, to approximate ^/a. Alternatively, it is possible to 
avoid multiple-precision division by using the iteration 
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x j+ i = Xj (3 - ax]) /2 (3.34) 
to approximate a~2 5 and then use y'a = a.a~2 to evaluate y^. 
The results of Lemmas 3.5 to 3.9 may be summarized in the following: 

Theorem 3.1 

D = I = M = R = S . (3.35) 

4 Some regularity conditions 

Before discussing the complexity of multiple-precision evaluation of exponentials, trigonometric 
functions, etc., we need some definitions. Throughout this section, let <p{x) be a real- valued 
function which is positive and monotonic increasing for all sufficiently large positive x. 

Definition 4.1 <j> £ <&i iff, for all a £ (0, 1), for some positive K, for all sufficiently large x and 
all xq, . . . , xj satisfying 

l<Xj< a j x (4.1) 

for j = 0, ... , J, we have 

J 

< K<Kz) . (4.2) 

i=o 

€ $2 iff) f° r some ct,f3& (0, 1) and all sufficiently large x, 

0(ax) < /30(x) . (4.3) 
G $3 iff, for some positive Ki, Ki and p, there is a monotonic increasing function ip such that 

i^ix p ^(x) < <f>{x) < K 2 x p ip(x) (4.4) 

for all sufficiently large x. 

Note the similarity between the definition of $i and the statement of Lemma 3.4. In Section 5, 
we need to assume that the time 4>(n) required to perform certain operations with precision n 
satisfies (4.2). The following lemmas make this assumption highly plausible. Lemma 4.1 shows 
that "for all a" in the definition of 3>i may be replaced by "for some a". 

Lemma 4.1 //, for some a £ (0,1) and some positive K, for all sufficiently large x and all 
xq, ... ,xj satisfying (4.1), we have (4.2), then <f> G <&i. 

Proof. Take any ct\ and «2 in (0, 1), and suppose that (4.1) with a replaced by a 2 implies (4.2) 
with K replaced by K 2 . Let m be a positive integer such that a™ < ct 2 . If (4.1) holds with a 
replaced by ot\ for a sequence (xo,xi, . . . , xj), then (4.1) also holds with a replaced by a 2 for 
each of the m subsequences 

(.Xq, X rn , . . .), (x\ , X m +i ,...),..., (x m _i , X2m— 1 j • • •) j 

so (4.2) holds with K replaced by K\ = mi^- 

Lemmas 4.2 and 4.3 show that <p £ $2 or <j> ^ i s a sufficient condition for <f> £ $1. The proof 
of Lemma 4.2 is similar to that of Lemma 3.4 (using Lemma 4.1), so is omitted. 
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Lemma 4.2 

$2^1. (4.5) 

Lemma 4.3 

$2 = $3 • (4.6) 

Proof. First suppose that 4> G $3, so (4.4) holds for some function ip and some positive K\,K 2 
and p. Choose a G (0, 1) such that /3 = a p K 2 /K\ < 1. For all sufficiently large x, we have 

0(ax) < K 2 a p x p ^{ax) < K 2 a p x p ip{x) < (K^/K^ <j>(x) < p<f>(x) (4.7) 
(using (4.4) and the monotonicity of ip), so <p G <3?2- 

Conversely, suppose that G $2 ; so (4.3) holds for all sufficiently large x (say x > xo > 0) and 
some cc, /3 £ (0, 1). Choose p small enough that /3 < a p , so 

0(ax) < o?<j>{x) (4.8) 

for x > xq. Since 0(x) is positive for sufficiently large x, we may assume that 4>{xq) > 0. Let 
Ki = a p ,K 2 = 1, and 

V>(x) = sup <j>{y)/y p (4.9) 

for x > xo- Thus, ip(x) is monotonic increasing and 

V>(x) > (j)(x)/x p (4.10) 

so 

0(x) < K 2 x p ip{x) (4.11) 

for x > xo- 

By repeated application of (4.8) we have, for k > 0, 

<f>(x)/x p > 4>{a k x)/{a k x) p (4.12) 

provided a k x > xq. Thus, from (4.9), 

V>(x)= sup (4.13) 
ax<y<x 

< 4>{x)/{ax) p (4.14) 

for x > xo/a. Thus, 

0(x) > K lX p ip{x) (4.15) 

and, in view of (4.11), <j) G $2- 
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5 Linear equivalence of various elementary functions 

In this section, we consider the multiple-precision "operations" of evaluating certain elementary 
functions (log, exp, sin, artan, etc). First we prove three theorems which apply under fairly 
general conditions. Theorem 5.1 is a generalization of Lemmas 3.7 and 3.8, and gives a simple 
condition under which the evaluation of f(x) is at least as difficult as a multiplication (in the 
sense of Definition 1.1). 

NOTATION. If / is a real- valued function defined on some finite interval [a, b], the operation 
of evaluating /(x) to (relative) precision n for x G [a, b] is denoted by (/). If there is no 
risk of confusion, we write simply Ei a ^(f) or E(f). We sometimes write t n (f) for t n (E(f)). 

[a,b] is the class of functions with Lipschitz continuous m-th derivatives on [a, b]. We 
always assume that b > a. 

Theorem 5.1 If f G LC^[a,b] and there is a point xo G (a, b) such that /"(xo) ^ 0, then 

E(f) > M . (5.1) 

Proof. For all sufficiently small h, we have (from [2, Lemma 3.2]) 

/(x + h) + /(x -h)- 2/(x ) = h 2 f"(x ) + R{x) , (5.2) 

where 

\R(x)\ < c 12 \h\ 3 . (5.3) 

Let c = /"(xo) / 0. Three evaluations of / and some additions may be used to approximate 
ch\ using (5.2). If h is of order 2 n / 3 , the resulting approximation to ch? has relative error 
of order 2~ ra / 3 . Proceeding as in the proof of Lemma 3.5, we see that six evaluations of / and 
some additions may be used to approximate cxy to precision n/3, for any x and y. Applying 
this result, with x replaced by the stored constant c~ 2 , and y replaced by the computed cxy, 
shows that 12 evaluations of / give c (c -2 ) {cxy) = xy to precision n/3. The result now follows 
from Lemma 3.2. 

REMARK. If f"(x) is not constant on [a, b], the point xo may be chosen so that f"(x) is 
rational, so (5.2) may be used to approximate h 2 , and the result follows more easily (as in the 
proof of Lemma 3.7). 

Theorem 5.2 gives conditions under which the multiple-precision evaluation of the inverse func- 
tion g = /( _1 ) of a function / is linearly reducible to the evaluation of /. (The inverse function 
satisfies g (/(x)) = x.) The condition [a, b] could be dropped, if we only required the com- 
putation of g with an absolute (rather than relative) error of order 2~ n . 

Theorem 5.2 7/0 [a, b], f G LC^[a,b], f'(x) ^ on [a,b], E(f)^M, and 

tn(f) G $i , (5.4) 

then 

E(g)±E(f), (5.5) 
where g = Z^" 1 - 1 and $i is as in Definition 4-1- 
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Proof. Since f(x) is continuous and nonzero on [a, b], there is no loss of generality in assuming 
that 

fix) > C13 > (5.6) 
on [a, b]. Thus, g(y) exists on [c,d] = [f(a),f(b)}. Also, since / [a, b], we have 

\g(y)\ > cu > (5.7) 

on [c, d] . 

To estimate g(y) we may solve ^(x) = by a discrete version of Newton's method, where 

1>(x) = f(x)-y. (5.8) 

Consider the iteration 

Xj+i = Xj - ip(xj)/nj , (5.9) 

where 

Hj = (ip(xj + hj) - ip(xj)) /hj , (5.10) 

and the computation of fij and Xj+i is performed with precision rij < n, giving computed values 
Jij and x~j+i respectively. If hj is of order 2~ n ^ 2 , then 

\llj-^{xj)\ < 2-^/ 2 c 15 , (5.11) 

and it is easy to show that 

\x j+1 - g(y)\ < |% - g(y)\ 2 c 16 + T n ^ 2 |% - g(y)\ c 17 + 2~ n ' c 18 . (5.12) 

Since a sufficiently good starting approximation xo may be found using single-precision (or at 
most bounded-precision) computation, (5.12) ensures that 

\xj+i - g(y)\ < \xj - g{y)\ 2 ciQ , (5.13) 

provided 

\xj-g(y)\ > 2"^/ 2 . (5.14) 

Hence, we may approximately double the precision at each iteration, and (5.13) guarantees 
convergence of order two. A final iteration with hj = 2~ n / 2 will be sufficient to give 

\x j+1 -g(y)\ < 2~ n c 20 • (5.15) 

Since E(f) > M, the result follows from (5.4), (5.7), (5.15), and Lemma 3.6. 

Theorem 5.3 I/O [a,b], f G LC^[a,b], f(x)f(x) / on [a,b], g = f~ l \ E(f)^M, 
E{g) = M, t n (f) £ and t n (g) £ then 

E(f) = E(g) . (5.16) 

Proof. Since t n (f) G $i, Theorem 5.2 applied to / gives E(g) =E(f). Similarly, applying 
Theorem 5.2 to f^ 1 ^ gives E(f) =E(g), so the result follows. 

We are now ready to deduce the linear equivalence of mp evaluation of various elementary func- 
tions fi, assuming that t n (fi) G $i. In view of Lemmas 4.2 and 4.3, this assumption is very 
plausible. 
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Corollary 5.1 // < a < b, c < d, 1 [a, b], t n (E [aM (log)) G $i, and t n (£ M (exp)) G $ i; 
then 

%b] (log) = S M (exp) . (5.17) 

Proof. From Theorem 5.1, £[ afe ](log) =M and (exp) = M. Also, the identities 

exp(— x) = 1/ exp(x) (5.18) 

and 

exp(Ax) = (exp(x)) A (5.19) 

(for suitable rational A) may be used to show that E[ cd ](exp) = E[ c / d /] (exp) for any d < d'. 
Hence, the result follows from Theorem 5.3. 

REMARK. If 1 G [a, b], then Theorem 5.2 shows that 

and a proof like that of Theorem 5.2 shows that 

so the conclusion of Corollary 5.1 follows, if 

4 2 ;J(exp) = 4^(exp). (5.22) 

Although (5.22) is plausible, no proof of it is known. (The corresponding result for multiplica- 
tion is given in Lemma 3.2.) 



Corollary 5.2 



E(sinh) = E(cosh) = E(tanh) = E(arsinh) 

= E(arcosh) = E(artanh) = E(exp) = E(log) 



(5.23) 

on any nontrivial closed intervals on which the respective functions are bounded and nonzero, 
assuming t n (sinh) G <&i etc. 



Corollary 5.3 



E(sin) = E(cos) = E(tan) = E(arsin) = E(arcos) = E(artan) (5.24) 



on any nontrivial closed intervals on which the respective functions are bounded and nonzero, 
assuming t ra (sin) G $1 etc. 

REMARKS. The proofs of Corollaries 5.2 and 5.3 are similar to that of Corollary 5.1 (using 
well-known identities), so are omitted. Since exp(zx) = cos(x) + isin(x), it is plausible that 
E(exp) = E'(sm), but we have not proved this. (It is just conceivable that the evaluation of 
exp(x) for complex x is not linearly reducible to the evaluation of exp(x) for real x.) 
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6 Upper and lower bounds 

In this section we give some upper and lower bounds on t n (A), t n (M), t„(exp) and i n (sin). Since 
the multiplicative constants are not specified, the bounds apply equally well to the operations 
which are linearly equivalent to addition, multiplication, etc. (see Sections 2 to 5). The lower 
bounds are trivial: t n ( ^) > C2\t n (M) > c 22 i n (A) > Q377 (from (2.2), Lemma 3.1 and Theorem 
5.1). The upper bounds are more interesting. 

UPPER BOUNDS ON t n (M) 

The obvious algorithm for multiplication of multiple-precision numbers gives 

t n (M) < c 24 n 2 , (6.1) 

but this is not the best possible upper bound. Karatsuba and Ofman [12] showed that 

t n (M) < c 25 n L58 - , (6.2) 

where 1.58 .. . = log 2 3. The idea of the proof is that, to compute 

(a + Xb) (c + Xd) = ac+X(ad + be) + X 2 bd , (6.3) 

where A is a suitable power of two, we compute the three products m\ = ac, tt7 2 = bd, and 
771,3 = (a + b)(c + d), and use the identity 

ad + be = 777,3 — ( m i + m 2) ■ (6-4) 

Thus, 2n-bit integers can be multiplied with three multiplications of (at most) (77 + l)-bit inte- 
gers, some multiplications by powers of two, and six additions of (at most) 4n-bit integers. This 
observation leads to a recurrence relation from which (6.2) follows. 

More complicated identities like (6.4) may be used to reduce the exponent in (6.2). Recently 
Schonhage and Strassen [23] showed that the exponent can be taken arbitrarily close to unity. 
Their method gives the best known upper bound 

t n {M) < c 26 ti log (77) log log (77) , (6.5) 

and uses an alogrithm related to the fast Fourier transform to compute certain convolutions. 
For a description of this and earlier methods see Knuth [13 (revised)]. Knuth conjectures that 
(6.5) is optimal, though the term loglog(n) is rather dubious. (It may be omitted if a machine 
with random-access memory of size 0(n p ) for some fixed positive p is assumed.) From results 
of Morgenstern [19] and Cook and Aanderaa [8], it is extremely probable that 

lim tJM)/n = 00 , (6.6) 

n->oo 

which implies that M ^ A, but more work remains to be done to establish this rigorously. 
UPPER BOUNDS ON t„(exp) AND i n (sin) 

To evaluate exp(x) to precision 77 from the power series 

00 

cxp(±x) = Y^{±xy/j\ , (6.7) 
3=0 

it is sufficient to take ^777/ log(n) terms, so 

^n(exp) < c 28 tn(M)n/log{n) . (6.8) 
Theorem 6.1 shows that the bound (6.8) may be reduced by a factor of order ^/n/log(n). 
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Theorem 6.1 

(exp) < (M) (6.9) 

and 

*n(sin) < c 30 \/^t„(M) . (6.10) 

Proof. To establish (6.9), we use the identity 

exp(x) = (exp(x/A)) A (6.11) 
with A = 2 q , where q = |_ri a J . If [a,b] is the domain of x, and c = max(|a|, \b\), then 

\(x/X) r /r\\ < 2"« r , (6.12) 

if r is large enough that 

c r < r! . (6.13) 

Hence, it is sufficient to take r = \n/q] terms in the power series for exp(x/A) to give an abso- 
lute error of order 2~ n in the approximation to exp (a;/ A). Since exp(x/A) is close to unity, the 
relative error will also be of the order 2~ n for large n. From (6.11), q squarings may be used to 
compute exp(x) once exp(x/A) is known. 

The method just described gives exp(x) to precision n — , for the relative error in exp(x/A) 
is amplified by the factor A. This may be avoided by taking r = \n/q] +1, and either working 
with precision n + na, or evaluating 

r 

exp(|x/A|) -1^5^ \x/\\ j /j\ (6.14) 
j'=i 

and then using the identity 

(l + e) 2 -l = 2e + e 2 (6.15) 

to evaluate exp(|x|) — 1 without appreciable loss of significant figures. Thus, (6.9) follows (using 
Lemma 3.2 if necessary). 

The proof of (6.10) is similar, using the identity 

sin(x) = ±2sin(x/2)^/l-sin 2 (x/2) , (6.16) 
q times to reduce the computation of sin(x) to that of sin(x/A) (recall Lemma 3.9). 

REMARKS. If x is a rational number with small numerator and denominator, the time re- 
quired to sum r terms in the power series for exp(x/A) is 0(rn), and the time required for 
q squarings is O (qt n (M)). Thus, choosing r = [y/t n (M)\ and q = \n/r] gives total time 

It is also possible to evaluate exp(x) in this time for general x, by using a form 
of preconditioning to reduce the number of multiplications required to evaluate the power series 
for exp(x/A). 

A NUMERICAL EXAMPLE 

The following example illustrates the ideas of Theorem 6.1. Suppose we wish to calculate e to 
30 decimal places. The obvious method is to use the approximation 
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28 

e~£l/j! (6.17) 

3=0 

(since 29! ~ 8.8 x 10 30 ). On the other hand 

(\ 256 
g^j 

also gives the desired accuracy (since 11! 256 10 ~ 4.8 x 10 31 ). Thus, the computation of 18 
inverse factorials may be saved at the expense of 8 squarings. 

Similarly, the computation of e to 10 6 decimal places by the obvious method requires the sum 
of about 205,030 inverse factorials, but the approximation 

1819 ^ \ 

£ 7^18207 > (6-19) 

requiring only 1820 terms and 1820 squarings, is sufficiently accurate. 
BASE CONVERSION 

Schonhage has shown that conversion from binary to decimal or vice versa may be done in time 
O [n (log(n)) 2 log (log(n))^ (see Knuth [13, ex. 4.4.14 (revised)]). We describe his method here, 
as a similar idea is used below to improve Theorem 6.1. 

Let f3 > 1 be a fixed base (e.g. (3 = 10), and suppose we know the base f3 representation of an 

t-i 

integer x, i.e. we know the digits do, ... , ck-i, where < di < (3 and x = Suppose that 

o 

n-bit binary numbers can be multiplied exactly in time M(n), where 

2M(n) < M(2n) (6.20) 

for all sufficiently large n. (This is certainly true if the Schonhage-Strassen method [13, 23] is 
used.) We describe how the binary representation of x may be found in time O (M(n) log(n)), 
where n is sufficiently large for x to be representable as an n-bit number (i.e. 2 n > /?*). 

Without changing the result, we may suppose t = 2 k for some positive integer k. Let the time 
for conversion to binary and computation of j3 2 be C(k). Thus, we can compute /3 i//2 and 

t/2-l t-i 

convert the numbers x\ = ^ dif3 l and X2 = ^ difi 1-1 ^ 2 to binary in time 2C(k — 1), and then 

o t/2 

x = xi + /3 t/2 x 2 and f$ = (/3 t/2 ) 2 may be computed in time 2M(n/2) + 0(n). Thus 

C(k) < 2C(k - 1) + 2M(n/2) + 0{n) , (6.21) 

so 

C(k) < 2M(n/2) + 4M(n/4) + 8M(n/8) + . . . + 0(n log(n)) 

< 0(M(n)log(n)) (6.22) 

(using (6.20)). 
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The proof that conversion from base 3 to base (3 may be done in time (6.22) is similar, and once 
we can convert integers it is easy to convert floating-point numbers. 

COMPUTATION OF e AND vr 

We may regard e — 2 = 1/2! + 1/3! + . . . as given by a mixed-base fraction 0.111 . . ., where the 
base is 2,3,... Hence, it is possible to evaluate e to precision n, using a slight modification of 
the above base-conversion method, in time 0(M(n) log(n)). 

Similarly, artan(l/j) may be computed to precision n in time 0(M(n) log 2 (n)), for any small 
integer j > 2, and then tt may be computed from well-known identities such as Machin's 

vr = 16 artan(l/5) - 4 artan(l/239) . (6.23) 

The methods just described are asymptotically faster than the 0(n 2 ) methods customarily used 
in multiple-precision calculations of e and it (see, for example, Shanks and Wrench [25,26]). It 
would be interesting to know how large n has to be before the asymptotically faster methods 
are actually faster. A proof that even faster methods are impossible would be of great interest, 
for it would imply the transcendence of e and it. 

IMPROVED UPPER BOUNDS ON t n (exp) AND t n (sm) 

The following lemma uses an idea similar to that described above for base conversion and com- 
putation of e. 

Lemma 6.1 If p and q are positive integers such that p 2 < q < 2 n , then exp(p/q) may be 
computed to precision n in time 0(M(n) log(n)). 

Proof. The approximation 

k 

eMp/q) c^J2^- (6.24) 



is sufficiently accurate if k is chosen so that 



Since p 2 < q, (6.25) gives k\q k l 2 < 2 n , so certainly 

k\q k < 2 2n , (6.26) 

Hence, a method like that described above for the computation of e may be used, and (6.26) 
ensures that the integers in intermediate computations do not grow too fast. 

From Lemma 6.1 it is easy to deduce Theorem 6.2, which is an improvement of Theorem 6.1 for 
large n. The methods used in the proof of Theorem 6.1 and the following remarks are, however, 
faster than that of Theorem 6.2 for small and moderate values of n. 

Theorem 6.2 // M(n) satisfies (6.20) then 

tn(exp) < c 32 M(n) log 2 (n) (6.27) 

and 

tn(sin) < c 33 M(n) log 2 (n) . (6.28) 
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Proof. Without affecting the result (6.27) we may assume that n = 2 k for some positive 
integer k. (This assumption simplifies the proof, but it is not essential.) Given an n-bit fraction 
x G [0, 1), we write 

k 

z = J>*M, (6.29) 

i=0 

where q\ = 2 2 ' and < pi < 2 2 * 1 for i = 0, 1, . . . , k. By Lemma 6.1, exp(pi/qi) can be computed, 
to sufficient precision, in time 0(M(n) log(n)), so 

k 

exp(x) = JJexp(pj/%) (6.30) 

i=0 

can be computed in time 0(M(n)(log(n)) 2 ). This establishes (6.27), and the proof of (6.28) is 
similar. 

Corollary 6.1 

t„(exp) < c 34 n(log(n)) 3 loglog(n) (6.31) 

and 

^ (sin) < c 35 n(log(n)) 3 loglog(n) . (6.32) 



Proof. This is immediate from the bound (6.5) and Theorem 6.2. 
Corollary 6.2 

tn (E [a , b ](f)) < c 36 n(log(n)) 3 loglog(n) , 

where 

f(x) = log(x), exp(x), sin(x), cos(x), tan(x), sinh(x), 
cosh(x), tanh(x), arsin(x), artan(x), arsinh(x), 

etc, and [a, b] is any finite interval on which f(x) is bounded. 

Proof. This follows from (6.5), Corollaries 5.1 (and the note following), 5.2, 6.1, and Lemma 3.2. 

7 Best constants for operations equivalent to multiplication 

In this section, we consider in more detail the relationship between the mp operations D, I, M, R, 
and S defined in Section 3. It is convenient to consider also the operation Q of forming inverse 
square roots (i.e., y ^— x~a). From Theorem 3.1, if we can perform any one of these operations 
(say Y) to precision n in time t n (Y), then the time required to perform any of the other opera- 
tions to precision n is at most a constant multiple of t n (Y). 

Definition 7.1 Cxy is the minimal constant such that, for all positive e and all sufficiently 
large n, the operation X can be performed (to precision n) in time (Cxy + ^)t n (Y) if Y can be 
performed in time t n (Y), where X, Y = D,I, M, Q, R or S. 

The following inequalities are immediate consequences of Definition 7.1: 

CxyCyz > Cxz (7.1) 
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and 



CxyCyx > Cxx = 1 • (7.2) 

ASSUMPTIONS 

To enable us to give moderate upper bounds on the constants Cxy, it is necessary to make the 
following plausible assumption (compare (4.3), (6.20)) throughout this section: for all positive 
a and e, and all sufficiently large n, 

t an {Y) < (a + e)t n (Y) (7.3) 
for Y = D,I, M, Q, R and S. We also assume (6.6). 

Table 7.1 gives the best known upper bounds on the constants Cxy ■ Space does not permit a 
detailed proof of all these upper bounds, but the main ideas of the proof are sketched below. 

TABLE 7.1 Upper bounds on C X y 





X = D 




M 


Q 


R 


S 


Y = D 


1.0 


1.0 


2.0 


3.0 


2.0 


2.0 


I 


7.0 


1.0 


6.0 


15.0 


14.0 


3.0 


M 


4.0 


3.0 


1.0 


4.5 


5.5 


1.0 


Q 


10.0 


4.0 


6.0 


1.0 


5.0 


3.0 


R 


7.5 


6.0 


6.0 


3.0 


1.0 


3.0 


S 


7.5 


5.5 


2.0 


7.0 


9.0 


1.0 



Cim < 3 

Use the Newton iteration 

x i+ i = Xi - Xi(axi - 1) (7.4) 

to approximate 1/a using multiplications. At the last iteration it is necessary to compute axi 
to precision n, but Xi(axi — 1) only to (relative) precision n/2. Since the order of convergence 
is 2, the assumptions (7.3) (with a = \) and (6.6) give 

C IM <{l + \){l + \ + \ + . ..) = ?,. (7.5) 

Cqm < 4.5 

Use the third-order iteration 

x i+ i =Xi- \xi (si - \ef) (7.6) 

where 

Si = axf - 1 (7.7) 

to approximate a~2. At the last iteration it is necessary to compute axf to precision n, ef to 
precision n/3, and xi (e^ — |e|) to precision 2ra/3. Thus 

Cqm < (2 + | + §)(! + I + | + ...) = § . (7.8) 

Note that this bound is sharper than the bound Cqm < 5 which may be obtained from the 
second-order iteration 
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Crd < 2 

Use Newton's iteration 

Xi+i = \{xi + a/xi) (7-10) 

to approximate yfa. 
Cms < 2 

This follows from (3.19) and our assumptions. 
C IS < 5.5 

Use the third-order iteration 

x i+1 = Xi - Xi (ei - ef) (7.11) 

where 

Ei = axi-1 (7.12) 

to approximate 1/a. 
Cqs<7 

Use the third-order iteration (7.6). 
C S i<3 

Prom the proof of Lemma 3.7, 

t n/3 (S) < t n (I) + 0(n) . (7.13) 

The result follows from the assumption (7.3) with a = 3. (This is the first time we have used 
(7.3) with a > 1. The assumption is plausible in view of the Schonhage-Strassen bound (6.5).) 
Upper bounds on Csq and Csr follow similarly. 

Cmi < 6 

This follows from (7.1) and our bounds on Cms anci Csi- Similarly for the bounds on Cmq, 
Cmr and Cri. 

Cqr < 3 

Use the identity 

a -h = I (v^TA - v^A) + O (A 2 /a 5/2 ) , (7.14) 
where A is a power of 2 such that 

2 -n/3-l < x j a < 2 l-n/3 _ ^ ^ 
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Thus 



t 2n/3 (Q)<2t n (R) + 0(n) , (7.16) 

and the result follows from (7.3). 
C DR < 7.5 
Use the identity 

b/a = j (Va 2 + Xb - Va 2 - Wj + O (X 2 b 3 /a 5 ) , (7.17) 
where A is a power of 2 such that (for 6/0) 

2 -n/3-l < Xb / a 2 < 2 l-n/3 _ 



Thus 

and the result follows. 
C IR < 6 

so 



t 2n /3(D) < t n (S) + 2t n (i2) + 0(n) , (7.19) 



a" 1 = (a 2 )~5 , (7.20) 



C/H < C SR + C QR <6. (7.21) 

The bound on Cjq also follows from (7.20), and then the bound on Crq follows from 
= (a -1 ) 
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8 Comparison of some mp methods for nonlinear equations 

In this section, we briefly consider methods for finding multiple-precision solutions of non-linear 
equations of the form 

f(x) = 0, (8.1) 

where f(x) can be evaluated for any x in some domain. Additional results are given in [38]. 

There are many well-known results on the efficiency of various methods for solving (8.1), e.g., 
Hindmarsh [10], Ostrowski [20], Traub [27] and the references given in Section 1, but the results 
are only valid if arithmetic operations (in particular the evaluation of f(x),f'(x) etc.) require 
certain constant times. The examples given below demonstrate that different considerations are 
relevant when multiple-precision arithmetic of varying precision is used. 

For simplicity, we restrict attention to methods for finding a simple zero £ of / by evaluating 
/ at various points. We assume that / has sufficiently many continuous derivatives in a neigh- 
bourhood of (, but the methods considered do not require the evaluation of these derivatives. 

Since f(x) is necessarily small near £, it is not reasonable to assume that f(x) can be evaluated 
to within a small relative error near £. In this section, an evaluation of / "with precision n" 
means with an absolute error of order 2~ n . We suppose that such an evaluation requires time 
w(n) = t n (E(f)), where 

w{cn) ~ c a w{n) (8.2) 
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for some constant a > 1 and all positive c. Since a > 1, the bound (6.5) and condition (8.2) 
give 

lim t n (M)/w(n) = , (8.3) 

so we may ignore the time required for a fixed number of multiplications and divisions per iter- 
ation, and merely consider the time required for function evaluations. Our results also apply if 
a = 1, so long as (8.3) holds. (For example, the evaluation of exp(x) by the method of Corollary 
6.1 requires time w(n) ~ C37n(log(n)) 3 log log(n), which satisfies (8.2) with a = 1, and also 
satisfies (8.3).) 

Definition 8.1 If an mp zero-finding method requires time t{n) ~ C{a)w(n) to approximate 
( / with precision n, where w(n) and ( are as above, then C(a) is the asymptotic constant 
of the method. (Not to be confused with the asymptotic error constant as usually defined for 
fixed-precision methods [2].) 

Given several mp methods with various asymptotic constants, it is clear that the method with 
minimal asymptotic constant is the fastest (for sufficiently large n) . The method which is fastest 
may depend on a, as the following examples show. 

DISCRETE NEWTON mp METHODS 

Consider iterative methods of the form 

x i+ i =Xi - f(xi)/gi , (8.4) 

where gi is a finite-difference approximation to f'{xi). If Si = \x{ — C\ is sufficiently small, f(xi) 
is evaluated with absolute error O (ef) , and 

gi = f'(xi) + 0( £i ) , (8.5) 

then 

\x i+1 -C\=0 (e?) , (8.6) 

so the method has order (at least) 2. 

The simplest method of estimating f'(xi) to sufficient accuracy is to use the one-sided difference 

„ fjxj + hj) - f(xj) 

91 ~ K ' (8 - 7) 

where hi is of order e,, and the evaluation of f{xi + hi) and /(xj) are performed with an absolute 
error O (ef). Thus, to obtain £ to precision n by this method (Ni), we need two evaluations of 
/ to precision n (at the last iteration), preceded by two evaluations to precision re/2, etc. (The 
same idea is used above, in the proof of Theorem 5.2.) The time required is 

t{n) ~ 2w(n) + 2io(n/2) + 2w{n/A) + ... . (8.8) 

Thus, from (8.2) and Definition 8.1, the asymptotic constant is 

C Nl (a) = 2(1 + 2~ a + 2~ 2a + ...) = 2/(1 - 2~ a ) . (8.9) 

Since 

2<C JVl (a)<4, (8.10) 

the time required to solve (8.1) to precision n is only a small multiple of the time required to 
evaluate / to the same precision. The same applies for the methods described below. 
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Using (8.7) is not necessarily the best way to estimate f'(xi). Let p be a fixed positive integer, 
and consider estimating f'(xi) by evaluating / at the points 

Xi - [p/2\hi,Xi - ([p/2\ - l)hi, ...,Xi+ \p/2]hi . 

(The points need not be equally spaced so long as their minimum and maximum separations are 
of order hi.) Let gi be the derivative (at Xi) of the Lagrange interpolating polynomial agreeing 
with / at these points. Since estimates /'(xj) with truncation error O (h?), we need hi of order 

e l / p . Then, to ensure that (8.5) holds, the function evaluations at the above points must be 

made with absolute error O ^£* +1//p ^ • Thus to obtain £ to precision n by this method (N p ) we 

need one evaluation / to precision n and p evaluations to precision nil + l/p)/2, preceded by 
one evaluation precision n/2 and p to precision n(l + l/p)/4, etc. The asymptotic constant is 

C N (p, a)=(l+p (?±±) a )/(l- 2- a ) . (8.11) 



2p 



Let 



Cn{o) = min Cn{p, «) , (8.12) 

p=l,2,... 

so the "optimal mp discrete Newton method" has asymptotic constant Cat (a). From (8.11), the 
p which minimizes Cjy(p, a) also minimizes p l / a (l + 1/p), so the minimum for a > 1 occurs at 
p = [a — lj or \a — 1] . In fact, p = 1 is optimal if 

1 < a < log(2)/ log(4/3) = 2.4094 . . . , (8.13) 

and p > 2 is optimal if 

logfl-p- 1 ) ^(l+p- 1 ) 

log(l-p- 2 ) log(l + l/(p(p + 2))) ' { ■ ) 

The result that method N2 is more efficient than method Ni if q > 2.4094 ... is interesting, 
for A^2 requires one more function evaluation per iteration than N±, and has the same order 
of convergence. The reason is that not all the function evaluations need to be as accurate for 
method N2 as for method N\. We give below several more examples where methods with lower 
order and/or more function evaluations per iteration are more efficient than methods with higher 
order and/or less function evaluations per iteration. 

For future reference, we note that 

1 < C N (a) < 4 , (8.15) 



CW(1) = 4, (8.16) 

and 

C N {a) - 1 ~ ea2~ a (8.17) 

as a — > 00. 

A CLASS OF mp SECANT METHODS 

It is well-known that the secant method is more efficient that the discrete Newton method 
for solving nonlinear equations with fixed-precision arithmetic [2, 20]. For mp methods the 
comparison depends on the exponent a in (8.2). 
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Let fc be a fixed positive integer and pk the positive real root of 

x k+l = 1 + x k_ (g_ 18 ) 

The iterative method S k is defined by 

xw = x * - fix * ] (/(^-/S-,)) ' (8 - 19) 

where the function evaluations are performed to sufficient accuracy to ensure that the order 
of convergence is at least pk- Thus, Si is the usual secant method with order p\ = = 
1.618 . . . ; S2, S3 etc. are methods with lower orders P2 = 1.4655 . . . ,ps = 1.3802 . . ., etc. With 
fixed-precision Si is always preferable to £2, S3 etc., but this is not always true if rap arithmetic 
is used. 

Suppose i and k fixed, 5 > small, and write e = \xi_k — C\ and p = Pk — Since the order of 
convergence is at least p, we have 

k«-C| = o(/) , (8.20) 

k+i-Cl=0(> fc+1 ) , (8.21) 

\xi - Xi- k \ = 0(e) , (8.22) 

and 

\f( Xi )\ = o(eP k ) . (8.23) 

For the approximate evaluation of the right side of (8.19) to give order p, the absolute error in 
the evaluation of f(xi) must be O (^£ pk+1 ^j , and the relative error in the evaluation of ( f(xi) — 

(fc-|-l h \ 
e p ~ p J, so the absolute error in the evaluation of f(xi-k) 

must be O [e pk+1 ~ pk . From (7.18), for 5 sufficiently small, 

p k+1 -p k + l>p, (8.24) 

so the evaluation of ( to precision n by method Sk requires evaluations of / to precision 
n,n/p,n/p 2 , . . . ,n/p h ~ 1 ,2n/p k+1 ,2n/p h+2 , etc. Thus, the asymptotic constant is 

C s (k, a) = l+p- a + ...+ p^ a + (2p-( fc+1 )) Q (l + p- a + . . .) 

1 _ p -ka + ( 2 p-(fc+l))a 
1 -p- a ' 

where (after letting S — > 0) p = pk satisfies (8.18). 



(8.25) 



We naturally choose k to minimize Cs(k,a), giving the "optimal mp secant method" with 
asymptotic constant 

C s (a) = mm C s (k,a) . (8.26) 

The following lemmas show that the optimal secant method is Si if a < 4.5243. . ., and S2 if 
a > 4.5243... 

Lemma 8.1 

C s (k,l) = 3 + p k k -p k . (8.27) 



22 



Proof. Easy from (8.18) and (8.25). 



Lemma 8.2 



as a — > oo. 

Proof. From (8.25), 



C s (k,a) - l~p-«- p - k « + (2p~ k {k+1) Y (8.29) 



as a — > oo. If k > 2 then, from (8.18), 

p k k =Pk 1 +P k k ~ 1 >Pk 1 +Pk>2, (8.30) 

so 

1 > 2p fc • (8.31) 

Thus, the result for k > 2 follows from (8.29). The result for k = 1 also follows from (8.29), for 
2p- 2 = 3-y/E. 

Lemma 8.3 

C5(a) = ic 5 (2, a) ,/a>« , (8 - 32) 
where «o = 4.5243 . . . is the root of 

C s (l,a ) = C s (2,a ) . (8.33) 



Proof. The details of the proof are omitted, but we note that the result follows from Lemmas 
8.1 and 8.2 for (respectively) small and large values of a. 

From (8.25), Cs{k,a) is a monotonic decreasing function of a, so the same is true of Cs(a). 
Thus, from Lemmas 8.1, 8.2 and 8.3, 

1 < C s (a) < 3 , (8.34) 



C s (l) = 3 , (8.35) 

and 

C s (a) - 1 ~ P2 a = (0-6823 . . .) a (8.36) 

as a oo. Comparing these results with (8.15) to (8.17), we see that the optional mp secant 
method is more efficient than the optimal mp discrete Newton method for small a, but less 
efficient for large a. (The changeover occurs at a = 8.7143 . . .) 

AN mp METHOD USING INVERSE QUADRATIC INTERPOLATION 

For fixed-precision arithmetic the method of inverse quadratic interpolation [2] is slightly more 
efficient than the secant method, for it has order Pq = 1.8392 . . . > 1.6180 . . ., and requires the 
same number (one) of function evaluations per iteration. For mp arithmetic, it turns out that 
inverse quadratic interpolation (Q) is always more efficient than the secant method S±, but it is 
less efficient than the secant method S2 if ct > 5.0571 . . . 
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Since the analysis is similar to that for method Si above, the details are omitted. The order pq 
is the positive real root of 

x 3 = l + x + x 2 . (8.37) 
For brevity, we write a = 1/pq = 0.5436 . . . 

To evaluate £ to precision n by method Q requires evaluations of / to precision n, (1 — a + a 2 )n, 
and (7 J (1 — a — a 2 + 2a 3 )n for j = 0, 1, 2, . . . Hence, the asymptotic constant is 

C Q (a) = 1 + (1 - a + a 2 ) a + (1 - a - a 2 + 2a 3 ) a /(l - a a ) 

= 1 + (1 - a + <r 2 ) Q + (3<j 3 )7(1 - a a ) (8.38) 

from (8.31). Corresponding to the results (8.15) to (8.17) and (8.34) to (8.36), we have that 
Cq(o) is monotonic decreasing, 

1 < Cq{o) < C Q (1) = i(7 - 2(7 - a 2 ) = 2.8085 . . . , (8.39) 

and 

C Q {a) - 1 ~ (1 - a + a 2 ) a = (0.7519 . . .) a (8.40) 

as a — > oo. Method Q is more efficient than the optimal mp secant method if a < 5.0571 . . ., 
and more efficient than the optimal mp discrete Newton method if a < 7.1349. . . We do not 
know any mp method which is more efficient than method Q for a close to 1. 

OTHER mp METHODS USING INVERSE INTERPOLATION 

Since inverse quadratic interpolation is more efficient than linear interpolation (at least for a 
close to 1), it is natural to ask if inverse cubic or higher degree interpolation is even more ef- 
ficient. Suppose ^ < fi < 1, and consider an inverse interpolation method 1^ with order 
In particular, consider the method 1^ which uses inverse interpolation at Xj, Xi-i, . . . , Xi-k to 
generate Xi + i, where k is sufficiently large, and the function evaluations at Xj, . . . , Xi-k are suffi- 
ciently accurate to ensure that the order is at least and, in general, no more than 1/ fi. (The 
limiting case I1/2 is the method which uses inverse interpolation through all previous points 
xo, xi, . . . ,Xi to generate Xj+i.) 

By an analysis similar to those above, it may be shown that the asymptotic constant of method 
III is 

00 

Cj(n,a)=J2(sj(riT , (8-41) 

j=0 

where so(fi) = 1 and 

8j (jj) = max [^-i(fi), 1 + - Ml " M')/(l " M)] (8-42) 

for j = 1,2,... Space does not allow a proof of (8.41), but related results are given in [20, 
Appendix H]. We note the easily verified special cases 

CVI^tA o j -Cs(l-n) (8..I:!) 




and 

(8.44) 
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The method with maximal order (see [7]) is Ii/ 2 , with asymptotic constant 

oo 

C i a,a) = Y J (j^) a - (8-45) 

J'=2 

The "optimal inverse interpolatory method" is the method 1^ with fj,(a) chosen to minimize 
Ci((x,a), so its symptotic constant is 

C I {a)= min Ci(jt,a) . (8.46) 

The following lemma shows that the optimal choice is /x = a, corresponding to the inverse 
quadratic method Q discussed above, if a < 4.6056 . . . 

Lemma 8.4 IfCi(a) = Ci(fi(a),a) then 

M (a) = a = 0.5436 ... if 1 < a < 4.6056 . . . , (8.47) 
fi(a) is a monotonic decreasing function of a, and 

lim^oo n(a) = \ . (8.48) 

From (8.39), 

C 7 (i,a)-l~(|) a (8.49) 

as a —7- oo, so Lemma 8.4 shows that the optimal inverse interpolatory is more efficient than 
methods Si and Q (as expected), but less efficient than method S2 or the optimal discrete New- 
ton method, for large a. In fact Cj(a) < Csifii) for 1 < a < 5.0608 . . . 

A LOWER BOUND FOR C(a) 

The following theorem shows that C(a) > 1 for all useful mp methods. The results above 
(e.g. (7.17)) show that the constant "1" here is best possible, as methods with C(a) -> 1 as 
a — > 00 are possible. The minimal value of C(a) for any finite a is an open question. 

Theorem 8.1 If an mp method is well-defined and converges to a zero of the functions fi(x) = 
F(x) — y and f2(y) = F^^iy) — x, where x and y are restricted to nonempty domains D x and 
D y , and F is some invertible mapping of D x onto D y such that t n (E(F)) satisfies (8.2), then 
the asymptotic constant of the method satisfies C{a) > 1. 

Proof. If (7(a) < 1 then, by solving fi(x) = 0, we can evaluate F 1 ^ -1 ^?/) (for y in D y ) in time 
less than t n (E(F)), for all sufficiently large n. Applying the same argument to f'2(y), we can 
evaluate F = (F^ 1 ))^ 1 ) in time less than £ n (F(F( -1 ))). Hence, for large n we have 

t n {E(F)) < t n (F(F(- x ))) < t n (E{F)) , (8.50) 

a contradiction. Hence, C{a) > 1. 

Conjecture 8.1 For all mp methods (using only function evaluations) which are well-defined 
and convergent for some reasonable class of functions with simple zeros, 

C(a) > 1/(1 -2~ a ) . (8.51) 



25 



SUMMARY OF mp ZERO-FINDING METHODS 



Of the methods described in this section, the most efficient are: 

1. optimal inverse interpolation, if 1 < a < 5.0608. . . (equivalent to inverse quadratic inter- 
polation, if 1 < a < 4.6056 . . .) ; 

2. optimal secant method (method S 2 ), if 5.0608 ... <a< 8.7143 . . . ; 

3. optimal discrete Newton, if 8.7143 ... < a. 

For practical purposes, the inverse quadratic interpolation method is to be recommended, for 
it is easy to program, and its asymptotic Cq(o) is always within 3.2% of the least constant for 
the methods above. Numerical values of the asymptotic constants, for various values of a, are 
given to 4D in Table 8.1. The smallest constant for each a is italicized. 

TABLE 8.1 Aysmptotic constants for various mp methods 



a C N (a) C s (l,a) C s (2,a) C Q (a) d(a) d{\,a) 



1.0 


4. 


.0000 


3 


,0000 


3 


.6823 


2. 


.8085 


2. 


.8085 


3 


.0000 


1.1 


3. 


.7489 


2 


.8093 


3 


,4256 


2. 


.6484 


2. 


.6484 


2 


.8193 


1.5 


3. 


.0938 


2 


.2987 


2. 


.7241 


2. 


.2108 


2. 


.2108 


2 


.3219 


2.0 


2. 


.6667 


1 


.9443 


2 


,2209 


1. 


.8954 


1. 


.8954 


1 


.9630 


3.0 


2. 


.1071 


1 


.5836 


1 


.6935 


1. 


.5586 


1. 


.5586 


1 


.5856 


4.0 


1, 


.6988 


1 


.3988 


1. 


,4248 


1. 


.3789 


1. 


.3789 


1 


.3898 


5.0 


1, 


.4260 


1 


.2860 


1. 


,2694 


1 


,2677 


1. 


.2676 


1 


.2718 


6.0 


1, 


.2529 


1 


.2105 


1 


.174-1 


1 


,1936 


1 


,1930 


1 


.1946 


7.0 


1, 


.1469 


1 


.1573 


1. 


.1131 


1 


,1420 


1 


.1410 


1 


.1416 


8.0 


1, 


,0838 


1 


.1185 


1 


.0748 


1 


.1051 


1 


.1039 


1 


.1041 


9.0 


1. 


.0471 


1 


.0898 


1. 


.0495 


1 


.0782 


1 


.0770 


1 


.0771 


10.0 


1. 


.0262 


1 


.0682 


1. 


.0328 


1 


.0584 


1 


,0573 


1 


.0573 


15.0 


1 


,0012 


1 


.0176 


1 


.0043 


1 


.0139 


1 


.0134 


1 


.0134 


20.0 


1 


.0001 


1 


.0046 


1 


.0006 


1 


.0033 


1 


.0032 


1 


.0032 



NOTE ADDED IN PROOF. Theorem 6.2 and its corollaries may be improved by a factor 
log(n), as described in [37] and [38]. 



26 



References 



[I] Brent, R.P. The computational complexity of iterative methods for systems of nonlin- 
ear equations. In Complexity of Computer Computations (edited by R.E. Miller and 
J.W. Thatcher). Plenum Press, New York, 1972, 61-71. 

[2] Brent, R.P. Algorithms for Minimization without Derivatives. Prentice-Hall, Englewood 

Cliffs, New Jersey, 1973. 
[3] Brent, R.P. Some efficient algorithms for solving systems of nonlinear equations. SIAM 

Numer. Anal. 10, 327-344, 1973. 
[4] Brent, R.P. On the precision attainable with various floating-point number systems. 

IEEE Trans. Comp. C-22, 601-607, 1973. 
[5] Brent, R.P. Error analysis of algorithms for matrix multiplication and triangular de- 
composition using Winograd's identity. Numer. Math. 16, 145-156, 1970. 
[6] Brent, R.P. Numerical solution of nonlinear equations. Computer Sci. Dept., Stanford 

University, March 1975, 189 pp. 
[7] Brent, R.P., Winograd, S. and Wolfe, P. Optimal iterative processes for rootfinding. 

Numer. Math. 20, 327-341, 1973. 
[8] Cook, S.A. and Aanderaa, S.O. On the minimum complexity of functions. Trans. Amer. 

Math. Soc. 142, 291-314, 1969. 
[9] Floyd, R.W. Unpublished notes. 

[10] Hindmarsh, A.C. Optimality in a class of rootfinding algorithms. SIAM J. Numer. Anal. 
9, 205-214, 1972. 

[II] Hopcroft, J.E. Complexity of computer computations. In Information Processing 74- 
North-Holland, Amsterdam, 1974, 620-626. 

[12] Karatsuba, A. and Ofman, Y. Multiplication of multidigit numbers on automata (Rus- 
sian). Dokl. Akad. Nauk SSSR 145, 293-294, 1962. 

[13] Knuth, D.E. The Art of Compter Programming, Vol. II, Seminumerical Algorithms. 
Addison Wesley, Reading, Massachusetts, 1969. Errata and addenda: Report CS 194, 
Computer Sci. Department, Stanford University, 1970. 

[14] Kung, H.T. The computational complexity of algebraic numbers. SIAM I. Nu- 
mer. Anal, (to appear). 

[15] Kung, H.T. A bound on the multiplicative efficiency of iteration. J. Computer & System 
Sciences 7, 334-342, 1973. 

[16] Kung, H.T. and Traub, J.F. Optimal order of one-point and multipoint iteration. 
J. ACM 21, 643-651, 1974. 

[17] Kung, H.T. and Traub, J.F. Computational complexity of one-point and multipoint 
iteration. In Complexity of Real Computations (edited by R. Karp). Amer. Math. Soc, 
Providence, Rhode Island, 1974, 149-160. 

[18] Kung, H.T. and Traub, J.F. Optimal order and efficiency for iterations with two eval- 
uations. Tech. Report, Department of Computer Science, Carnegie-Mellon University, 
1973. 

[19] Morgenstern, J. The linear complexity of computation. J. ACM 20, 305-306 (1973). 
[20] Ostrowski, A.M. Solution of Equations in Euclidean and Banach Spaces. Academic 
Press, New York, 1973. 

[21] Paterson, M.S. Efficient iterations for algebraic numbers. In Complexity of Computer 
Computations (edited by R.E. Miller and J.W. Thatcher). Plenum Press, New York, 
1972, 41-52. 

[22] Rissanen, J. On optimum root-finding algorithms. J. Math. Anal. Applies. 36, 220-225, 
1971. 



27 



[23] Schonhage, A. and Strassen, V. Schnelle Multiplikation grosser Zahlen. Computing 7, 
281-292, 1971. 

[24] Schultz, M.H. The computational complexity of elliptic partial differential equations. 

In Complexity of Computer Computations (edited by R.E. Miller and J.W. Thatcher). 

Plenum Press, New York, 1972, 73-83. 
[25] Shanks, D. and Wrench, J.W. Calculation of it to 100,000 decimals. Math. Comp. 16 

76-99, 1962. 

[26] Shanks, D. and Wrench, J.W. Calculation of e to 100,000 decimals. Math. Comp. 23 
679-680, 1969. 

[27] Traub, J.F. Iterative Methods for the Solution of Equations. Prentice-Hall, Englewood 

Cliffs, New Jersey, 1964. 
[28] Traub, J.F. Computational complexity of iterative processes. SI AM J. Computing 1, 

167-179, 1972. 

[29] Traub, J.F. Numerical mathematics and computer science. Comm. ACM 15, 537-541, 
1972. 

[30] Traub, J.F. Optimal iterative processes: theorems and conjectures. In Information 
Processing 71 . North-Holland, Amsterdam, 1972, 1273-1277. 

[31] Traub, J.F. Theory of optimal algorithms. In Software for Numerical Mathematics 
(edited by D.J. Evans). Academic Press, 1974. 

[32] Traub, J.F. An introduction to some current research in numerical computational com- 
plexity. Tech. Report, Department of Computer Science, Carnegie-Mellon University, 
1973. 

[33] Winograd, S. On the time required to perform addition. J. ACM 12, 277-285, 1965. 
[34] Winograd, S. On the time required to perform multiplication. J. ACM 14, 793-802, 
1967. 

[35] Wozniakowski, H. Generalized information and maximal order of iteration for operator 

equations. SI AM J. Numer. Anal. 12, 121-135, 1975. 
[36] Wozniakowski, H. Maximal stationary iterative methods for the solution of operator 

equations. SIAM J. Numer. Anal. 11, 934-949, 1974. 
[37] Brent, R.P. Fast multiple-precision evaluation of elementary functions. J. ACM 23, 

242-251, 1976. 

[38] Brent, R.P. Multiple-precision zero-finding methods and the complexity of elementary 
function evaluation. In Analytic Computational Complexity (edited by J.F. Traub). 
Academic Press, New York, 1975, 59-73. 



28 



Postscript (September 1999) 



Historical Notes 

This paper was retyped (with minor corrections) in M?eX during August 1999. It is available 
electronically from http : //wwwmaths . anu . edu . au/ -brent /pub/pub032 . html 

The related paper Brent [38] is available electronically from 
http : / / wwwmaths . anu . edu . au/ -brent /pub/ pub028 . html 

The paper Kung [14] appeared in SI AM J. Numer. Anal. 12 (1975), 89-96. 
Sharper Results 

Some of the constants given in Table 7.1 can be improved, e.g. Com, Crm, CdSi Crs- One 
source of improvement is given in a report by Karp and Markstein 2 . 

For example, consider Com- We want to compute an n-bit approximation to b/a. If Xi — > 1/a 
as in (7.4) and we define yi = bxi, then yi — > b/a. Also, if Xi satisfies the recurrence (7.4), then 
yi satisfies 

Vi+i =V%- Xi(ayi - b) . (7.4') 

Note that (7.4') is self-correcting because of the computation of the residual ayi — b. Suppose 
Xi has (relative) precision n/2. If we approximate yi = bxi using an §-bit multiplication, 
compute the residual ayi — b using an n-bit multiplication, then its product with Xi using an 
§-bit multiplication, we can apply (7.4') to obtain y,i + \ with relative precision n. Assuming Xi 
is obtained in time ~ 3M(n/2) ~ |M(n) (see (7.5)), the time to obtain yi + \ is ~ |M(n), i.e. 
Cdm < 3.5, which is sharper than the bound Cdm < 4.0 given in Table 7.1. 

Similarly, we can obtain Crm < 4.25, which is sharper than the bound Crm < 5.5 given in 
Table 7.1. If Xj — > a -1 / 2 and yi = axi — > yfa, we compute a precision n/2 approximation Xi in 
time ~ |M(n/2) as in Section 7, then apply a final second-order iteration for 

J/i+i = J/i - - o)/2 (7.9') 

(derived by multiplying (7.9) by a and using (7.7)) to obtain a precision n approximation 
to \fa. 

As a corollary, the time required for an arithmetic-geometric mean iteration [37,38] is reduced 
from ~6.5M(n) to ~5.25M(n). 

The Definition of n-bit Multiplication 

Our t n (M) (see Sections 1-3) is essentially the time required to compute the most significant n 
bits in the product of two n-bit numbers. In Brent [38], t n (M) is written as M(n). A related 
but subtly different function is M*(n), defined as the time required to compute the full 2n-bit 
product of n-bit numbers 3 . Paul Zimmermann 4 observed that smaller constants can sometimes 

Alan H. Karp and Peter Markstein, High Precision Division and Square Root, HP Labs Report 93-93-42 (R.l), 
June 1993, Revised October 1994. Available electronically from 
http : //www . hpl . hp . com/techreports/93/HPL-93-42 . html 

3 In Brent [37] we (confusingly) used the notation M(n) for M*(n). 

4 Personal communication, 1999. 
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be obtained in row Y = M of Table 7.1 if we use M*{n) instead of M(n). (We denote these 
constants by Cxm* to avoid confusion with the Cxm of Table 7.1.) For example, Com* < 3.5 
and C RM * < 4.25. 

It is an open question whether 

M(n) ~ M*(n) as n ->■ oo ; 
with the best available multiplication algorithms (those based on the FFT) this is true 5 . 

Final Comments 

Daniel Bernstein 6 observed that the time required to compute re-bit square roots can be reduced 
further if the model of computation is relaxed so that redundant FFTs can be eliminated. Similar 
remarks apply to division, exponentiation etc (and to operations on power series). 

In conclusion, 25 years after the paper was written (in 1974), improvements can still be found, 
and the last word is yet to be written! 



Similar remarks apply if we consider computing the product of two polynomials of degree n — 1, and ask either 
for the first n terms in the product or the complete product. Although the first computation is faster (by a factor 
of about two) if the classical order n 2 algorithms are used, it is not significantly faster if FFT-based algorithms 
are used. 

6 Personal communication, f999. 
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